Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica

نویسنده

  • Helena Blancafort
چکیده

In this paper we present preliminary work conducted on semi-automatic induction of inflectional paradigms from non annotated corpora using the open-source tool Linguistica (Goldsmith 2001) that can be utilized without any prior knowledge of the language. The aim is to induce morphology information from corpora such as to compare languages and foresee the difficulty to develop morphosyntactic lexica. We report on a series of corpus-based experiments run with Linguistica in Romance languages (Catalan, French, Italian, Portuguese, and Spanish), Germanic languages (Dutch, English and German), and Slavic language Polish. For each language we obtained interesting clusters of stems sharing the same suffixes. They can be seen as mini inflectional paradigms that include productive derivative suffixes. We ranked results depending on the size of the paradigms (maximum number of suffixes per stem) per language. Results show that it is useful to get a first idea of the role and complexity of inflection and derivation in a language, to compare results with other languages, and that it could be useful to build lexicographic resources from scratch. Still, special post-processing is needed to face the two principal drawbacks of the tool: no clear distinction between inflection and derivation, and not taking allomorphy into account.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages

In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i.e., parallel) data. We first induce a translation lexicon from monolingual corpora, based on cognate detection followed by cross-lingual contextual similarity. Second, POS informatio...

متن کامل

Reference to Kinds across Languages

NPs occurring in canonical argumental positions) from a crosslinguistic point of view. It is proposed that languages may vary in what they let their NPs denote. In some languages (like Chinese), NPs are argumental (names of kinds) and can thus occur freely without determiner in argument position; in others they are predicates (Romance), and this prevents NPs from occurring as arguments, unless ...

متن کامل

Lexical and orthographic distances between Germanic, Romance and Slavic languages and their relationship to geographic distance

When reading texts of different but closely related languages, intelligibility is determined among others by the number of words which are cognates of words in the reader’s language, and orthographic differences. Orthographic differences partly reflect pronunciation differences and therefore are partly a linguistic level. Dialectometric studies in particular showed that different linguistic lev...

متن کامل

Measuring cross-linguistic intelligibility in the Germanic, Romance and Slavic language groups

We administered six functional intelligibility tests, i.e., spoken and written versions of (i) an isolated word recognition test, (ii) a cloze test at the sentence level and (iii) a picture-to-text matching task at the paragraph level. The scores on these functional tests were compared with each other and with intersubjective measures obtained for the same materials through opinion testing, i.e...

متن کامل

Computational Morphologies for Small Uralic Languages

This article presents a set of morphological tools for small Uralic languages. Various Hungarian research groups specialized in Finno-Ugric linguistics and a Hungarian language technology company (MorphoLogic) have initiated a project with the goal of producing annotated electronic corpora for small Uralic languages. The languages described include Mordvin, Udmurt (Votyak), Komi (Zyryan), Mansi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010